Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend functionality of Sequential Feature Selector to allow repeating cross-validation. #272

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

nd26
Copy link

@nd26 nd26 commented Oct 31, 2017

Description

Added ‘n_cv_repeats’ parameter and extended functionality with RepeatedStratifiedKFold and RepeatedKFold to allow repeating cross-validation for both classifiers and regressors respectively.

Related issues or pull requests

Fixes #271

Pull Request requirements

  • Added appropriate unit test functions in the ./mlxtend/*/tests directories
  • Ran nosetests ./mlxtend -sv and make sure that all unit tests pass
  • Checked the test coverage by running nosetests ./mlxtend --with-coverage
  • Checked for style issues by running flake8 ./mlxtend
  • Added a note about the modification or contribution to the ./docs/sources/CHANGELOG.md file
  • Modify documentation in the appropriate location under mlxtend/docs/sources/ (optional)
  • Checked that the Travis-CI build passed at https://travis-ci.org/rasbt/mlxtend

@pep8speaks
Copy link

pep8speaks commented Oct 31, 2017

Hello @nd26! Thanks for updating the PR.

Line 21:80: E501 line too long (91 > 79 characters)
Line 24:1: E302 expected 2 blank lines, found 1
Line 28:80: E501 line too long (105 > 79 characters)
Line 30:80: E501 line too long (95 > 79 characters)
Line 30:96: W291 trailing whitespace
Line 33:80: E501 line too long (92 > 79 characters)
Line 111:78: W291 trailing whitespace
Line 112:77: W291 trailing whitespace
Line 113:77: W291 trailing whitespace
Line 138:30: E251 unexpected spaces around keyword / parameter equals
Line 138:32: E251 unexpected spaces around keyword / parameter equals
Line 164:80: E501 line too long (86 > 79 characters)
Line 166:80: E501 line too long (87 > 79 characters)
Line 169:9: E303 too many blank lines (2)

Comment last updated on November 01, 2017 at 15:51 Hours UTC

@rasbt
Copy link
Owner

rasbt commented Oct 31, 2017

Thanks for the PR! Based on the test results, it just seems to be a Tab vs. Whitespace error. I.e., try to replace tabs by 4 whitespaces each.

Btw. you can always test your code locally by running nosetests mlxtend/feature_selection/, for example.

@nd26
Copy link
Author

nd26 commented Oct 31, 2017

No problem! Thanks for the reply. Will do tomorrow morning :)

@rasbt
Copy link
Owner

rasbt commented Nov 1, 2017

Sure, no hurry!

Btw. added an issue regarding the random number generator you mentioned via email (#274). I am not sure what's more convenient, adding the Repeated CV feature or the random seed first. But I think adding repeated CV (using the current cross_val_score implementation) should work without an extra random seed. After that is implemented, I am happy to rewrite the code to support random seeds.

However, I just wanted to mention it in case the repeated CV causes troubles without random seeds (i.e., irreproducible unit tests). In that case, we may want to add random seeds first and then the repeated CV.

Anyways, thanks a lot for contributing, both repeated CV and the random seeds are useful enhancements!

@nd26
Copy link
Author

nd26 commented Nov 1, 2017

Most likely, if there are unit tests that test the n_cv_repeats parameter, they will fail due to the lack of reproducibility (SFS is highly unstable when used against a large number of features). But, current tests only test cross_val_score without using different values for n_cv_repeats, right?

No problem man!

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.08%) to 90.828% when pulling 2a5c9c5 on nd26:feature/RepeatedSKFold271 into 3028be8 on rasbt:master.

@rasbt
Copy link
Owner

rasbt commented Nov 1, 2017

I just see the reason why one of the unit tests is failing is because one of them is running sklearn 0.18, which doesn't yet have the RepeatedKFold class (it was added in sklearn 0.19).

I think it's fair to switch all the unit tests to sklearn >= 0.19 as it has been out now for a while.

The way to fix this would be to change SKLEARN_VERSION="0.18" in the .travis.yml file to SKLEARN_VERSION="0.19" in the section

- python: 3.5
          env: LATEST="false" COVERAGE="false" NUMPY_VERSION="1.11.2" SCIPY_VERSION="0.18.1" SKLEARN_VERSION="0.18" PANDAS_VERSION="0.19.1"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants